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Optimizing the use of lignocellulosic biomass as the feedstock for renewable energy 
production is currently being developed globally. Biomass is a complex mixture of cellulose, 
hemicelluloses, lignins, extractives, and proteins; as well as inorganic salts. Cell wall 
compositional analysis for biomass characterization is laborious and time consuming. In 
order to characterize biomass fast and efficiently, several high through-put technologies 
have been successfully developed. Among them, near infrared spectroscopy (NIR) and 
pyrolysis-molecular beam mass spectrometry (Py-mbms) are complementary tools and 
capable of evaluating a large number of raw or modified biomass in a short period of 
time. NIR shows vibrations associated with specific chemical structures whereas Py- 
mbms depicts the full range of fragments from the decomposition of biomass. Both 
NIR vibrations and Py-mbms peaks are assigned to possible chemical functional groups 
and molecular structures. They provide complementary information of chemical insight 
of biomaterials. However, it is challenging to interpret the informative results because 
of the large amount of overlapping bands or decomposition fragments contained in the 
spectra. In order to improve the efficiency of data analysis, multivariate analysis tools 
have been adapted to define the significant correlations among data variables, so that 
the large number of bands/peaks could be replaced by a small number of reconstructed 
variables representing original variation. Reconstructed data variables are used for sample 
comparison (principal component analysis) and for building regression models (partial least 
square regression) between biomass chemical structures and properties of interests. In 
this review, the important biomass chemical structures measured by NIR and Py-mbms are 
summarized. The advantages and disadvantages of conventional data analysis methods and 
multivariate data analysis methods are introduced, compared and evaluated. This review 
aims to serve as a guide for choosing the most effective data analysis methods for NIR and 
Py-mbms characterization of biomass. 

Keywords: biomass characterization, lignocellulosic biofuel, near infrared spectroscopy, pyrolysis molecular beam, 
mass spectrometry, multivariate data analysis, high throughput, chemometrics 



INTRODUCTION FOR BIOMASS CHEMICAL COMPOSITION 

Biomass is a complicated mixture of organic and inorganic 
compounds. It is mainly composed of cellulose, hemicelluloses 
and lignins, as well as minor components, such as proteins, extrac- 
tives, ash, and other nonstructural mineral materials. Because of 
its renewable nature and chemical composition, biomass is an 
attractive feedstock for energy and chemical products (Ragauskas 
etal., 2006; Himmel etal, 2007; Wei etal., 2009; Sluiter etal., 
2010). In order to provide an effective guide for feedstock selec- 
tion and process development, it is very important to measure 
biomass chemical composition accurately and efficiently (Sluiter 
etal, 2010; Templeton etal, 2010; Daystar etal., 2013). In this 
paper, we will review the use of two high-throughput techniques, 
near infrared spectroscopy (NIR) and pyrolysis-molecular beam 
mass spectrometry (Py-mbms) in biomass characterization. The 
advantages and disadvantages of different data analysis methods, 
including band/peak assignment, tools for spectral treatments and 



resolution enhancement and multivariate data analysis methods, 
are introduced, compared and evaluated. Selected research pub- 
lications are reviewed and categorized as "case studies" according 
to the ways they analyzed data and the specific biomass properties 
that are evaluated. 

CONVENTIONAL BIOMASS CHARACTERIZATION RELEVANT 
TO BIOFUEL PRODUCTION 

Traditional biomass compositional analysis, based on two-stage 
sulfuric acid hydrolysis followed by gravimetric and instrumental 
analysis, has been used to measure lignin and carbohydrates for 
more than 1 00 years. These methods have been used by researchers 
for studies of wood materials, animal food, human health, bioen- 
ergy production, and many other areas related to biomaterials. 
The history and uses of these methods were reviewed in detail 
elsewhere (Sluiter etal., 2010). The analytical uncertainty for 
different methods was also evaluated by statistical analysis and 
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reported as the standard deviation of measurement for each com- 
ponent (Templeton etal., 2010). Other wet chemical techniques 
also include: acidolysis, thioacidolysis, nitrobenzene oxidation, 
transesterification, acetyl bromide method, orcinol method, Van 
Soest method, etc. Routine procedures, a number of less com- 
mon methods, and new analytical methods developed for research 
purposes in the field of wood chemistry are described in books 
(Browning, 1967; Sjdstrom and Alen, 1999). These techniques 
quantify important chemical structure biomass, but they are time 
consuming and laborious. 

Separately, combustion-related properties are of interest for the 
utilization of biomass in biofuel and biopower production. There 
are three types of combustion-related properties: morphological, 
physical, and chemical properties (Braadbaart and Poole, 2008). 
Traditional fuel analysis of biomass includes ultimate analysis, 
proximate analysis, and thermogravimetric analysis. In addition, 
ash composition and sulfur can be determined and used to predict 
fuel indices, especially for slagging behavior, aerosol formation, 
and corrosion related risks (Obernberger, 2014). 

USE OF SPECTROSCOPIC TOOLS IN BIOMASS 
CHARACTERIZATION AS HIGH THROUGHPUT TECHNIQUES 

Spectroscopic methods, such as Fourier transform infrared spec- 
troscopy (FTIR), NIR, Raman spectroscopy (Raman), and nuclear 
magnetic resonance (NMR), are widely used to measure functional 
groups and chemical bonds in biomass. These measurements 
are faster and more convenient than most conventional chemi- 
cal methods used for biomass characterization and fuel analysis. 
Besides, since there is no degradative chemical treatment used 
during analysis, the information gained from these tools is more 
representative of the chemical structures in original biomass. 
However, there are some drawbacks for using these spectroscopic 
tools. For example, data interpretation for FTIR, Raman, and 
NMR is relatively complicated, sample preparation can be com- 
plex, and due to the mixed nature of biomass, peak assignment 
usually suffers from the overlap of many compounds. A good 
summary of spectroscopic tools used as high throughput tech- 
niques in biomass study can be found in a recent review (Lupoi 
etal, 2014). 

HIGH THROUGHPUT TECHNIQUES COUPLED WITH 
MULTIVARIATE STATISTICAL ANALYSIS 

Because of many chemical features included in a single spectrum, 
it is challenging to elucidate data directly for a group of samples. 
Therefore, multivariate analysis (MVA) tools have been widely 
used in spectroscopic data analysis (Jin and Xu, 2011; Smith - 
Moritz etal., 2011; Xu etal, 2013; Lupoi etal, 2014). Among 
them, the two multivariate tools that have been widely used are: ( 1 ) 
Principal component analysis (PCA), and (2) Partial least square 
(PLS). 

Principal component analysis is mainly used for identifying 
outliers, sample comparison, and screening. It relies on projecting 
original samples variables on several (usually <six) reconstructed 
variables which are representative of original sample variation. 
Those reconstructed variables are known as principal compo- 
nents (PCs). Samples described with PCs can be plotted in scores 
plot, in which similar samples cluster together while samples 



different from each other are separated in two-, three-, or n- 
dimensional coordinates. Together with the scores plot, PCA 
loadings plot allows for the determination of important chemi- 
cal features responsible for the sample grouping. In the loadings 
plot, variables with large values are highly correlated with sample 
grouping (Sykes et al., 2009). 

Partial Least Square is used to build prediction correlation mod- 
els between spectral data and the property of interest. In the 
application of NIR and Py-mbms, spectral data is regarded as 
"predictors" for the biomass properties of interest. The properties 
of a new sample can then be estimated using a PLS model built 
from spectral data taken on a set of similar samples with known 
characteristics. In this way, time consuming experiments for new 
samples could be eliminated. Regression coefficients are generated 
and can be used to relate chemical features in the spectra to the 
specific sample properties (Labbe etal., 2006). 

In summary, multivariate tools used in spectroscopic data 
analysis have three functions: (1) comparing sample similarities 
and differences and discovering outliers; (2) building prediction 
models between spectroscopic data and biomass properties of 
interest; and (3) discovering correlations between property data 
and spectral data. 

BIOMASS CHARACTERIZATION BY NIR SPECTROSCOPY 

Near infrared spectroscopy is normally considered to be in the 
range of electromagnetic spectrum from 12,000 to 4000 cm^ 1 
(Smith-Moritz et al, 201 1). This wavelength region has two major 
advantages: first, the speed of spectral acquisition is high, which 
facilitates the real-time data collection for process control; sec- 
ondly, the wide applicability to a diverse ranges of materials with 
little or no sample preparation (Schwanninger etal, 2011). This 
allows NIR to be effective for online monitoring and quality con- 
trol of a wide variety of product properties and manufacturing 
processes (Workman, 2001; Kelley et al, 2004a; Tsuchikawa, 2007; 
Jin and Xu, 2011). Because of this, NIR has been extensively used 
as a high-throughput method to determine chemical, physical, 
mechanical, and fuel properties of woody biomass during the past 
20 years. 

However, there are some disadvantages to NIR. Although NIR 
absorption spectra have similar patterns to those in the mid-IR, 
they have wider separation, more anti-symmetry, and weaker 
intensity due to the fact that it is the combination and over- 
tone bands from fundamental vibrations involved in NIR region. 
Therefore, the interpretation of NIR spectra are much harder than 
mid-IR (Schwanninger etal., 2011; Lupoi etal, 2014). 

The utility of band assignments depends on the purpose 
of specific research or application. There is ongoing discus- 
sion around the necessity of interpreting NIR spectra in detail. 
Chemical/physical information contained in the NIR spectra can 
be used for detailed analysis (Schwanninger etal, 2011). How- 
ever, it is not necessary to fully understand the chemical details 
for NIR to be useful for quantitative analysis. If NIR is used 
as a fast tool in distinguishing samples and in building predic- 
tion models for biomass properties, the detailed assignments 
are generally not needed. Statistical analysis for extracting use- 
ful information is essential for this purpose (Xu etal., 2013). 
Meaningful scientific insight of structural information could be 



Frontiers in Plant Science | Plant Biophysics and Modeling 



August 2014 | Volume 5 | Article 388 | 2 



Xiao etal. 



Biomass NIR Py-mbms with multivariate 



better gained with the help of both statistical analysis and band 
assignments. 

NIR BAND ASSIGNMENT AND DATA PROCESSING 

In NIR analysis, data points are usually collected in reflectance 
form (_R) and converted to logio( 1/-R) form, which is equivalent to 
an absorbance spectrum. 

As stated above, knowledge regarding band assignment is 
important for the understanding of chemical structures in biomass 
and there are several references on NIR band assignments 
(Tsuchikawa etal., 2003; Schwanninger etal., 2011; Via etal., 
2013). Commonly assigned vibrations in the NIR spectra of woody 
biomass include (Schwanninger etal, 2011): 

(1) 1370-1471 nm: First and second overtones of O-H stretching 
vibrations from free or weakly bonded O-H in carbohydrates 
and first overtones of C-H, C ar omatic-H stretching vibrations, 
such as first overtone of O-H stretching in free OH group or 
OH group with a weak H-bond from cellulose, xylan, and glu- 
comannan (1386, 1414, 1428, 1471, 1477-1484), first overtone 
of O-H stretching in phenolic hydroxyl groups from extractive 
or lignin (1410, 1447, 1448), first overtone of C-H stretching 
and bending in aromatic associated C-H from lignin (1417, 
1440). 

(2) 1471-1632 nm: First overtone of O-H stretching from 
strong O-H bonded group, semi-crystalline and crystalline 
region of cellulose (1473-1632) or intramolecular H-bond in 
glucomannan (1471, 1493). 

(3) 1666-2000 nm: First overtone of aliphatic and aromatic C- 
H stretching vibrations and O-H combination bands from 
extractives/lignin (e.g., 1668, 1674, 1684, 1726), hemicellu- 
lose (e.g., 1720, 1724), cellulose (e.g., 1723, 1731), which are 
overlapped with each other and water band (e.g., 1887-2000). 

(4) ABOVE 2000 nm: Assignment in this region is difficult due to 
high number of possibilities for the coupling of vibrations. 

There are a number of well-established NIR spectra preprocess- 
ing techniques that can be used to achieve resolution enhancement 
and to more precisely locate band position. Methods for spec- 
tral data preprocessing include: (1) smoothing and derivatization 
(Denoyer and Dodd, 2002; Rousset etal., 2011) such as using the 
algorithm based method used by Savitzky and Golay (1964), (2) 
calculation of differential spectra (Rousset etal., 2011), and (3) 
Fourier self de-convolution, curve fitting (Ozaki etal., 2001) with 
more advanced techniques involving PCA (Fackler and Schwan- 
ninger, 2010) and two dimensional correlation analysis (Ozaki 
etal., 2001; Schwanninger etal., 2011). 

Among those preprocessing methods, derivatives are widely 
used to reduce the impact of overlapping peaks and baseline vari- 
ation. However, there is a concern that generating derivatives can 
possibly generate false information. Both the shape of the spec- 
trum and the data processing algorithms have an impact on band 
shape and location. Differences between the location of the bands 
between the raw and the second derivative spectrum can be more 
than 20 cm~ 1 (5 nm). Researchers have also reported that the sec- 
ond derivative form was not always more precise than the normal 
form for the prediction of lignin in wood (Michell, 1995; Xu et al., 
2013). Therefore, when spectral data is processed with the second 



derivative, possible peak shifts should be taken into consideration. 
The same consideration is also important for deriving conclusions 
from processing spectra of PCA and regression coefficients from 
PLS (Schwanninger etal., 2011). 

NIR SPECTROSCOPY COUPLED WITH PCA 

The primary application of NIR coupled with PCA is to classify 
biomass samples of various origins or from different pretreat- 
ments without conducting laborious traditional wet chemistry 
techniques on all samples. Related areas of this application are 
summarized below: 

(1) Related to species/plant fractions (Michell, 1995; Kelley etal., 
2004a; Labbe et al, 2008a,b; Nkansah et al, 2010); 

(2) Related to genetic engineering of feedstock crops (Bailleres 
etal, 2002; Sandak and Sandak, 201 1; Zhou et al, 201 1); 

(3) Related to chemical/thermal/biological treatments (Kelley 
etal, 2004b; Yang etal, 2007; Houghton etal, 2009; 
Krongtaew etal., 2010). 

For example, in order to evaluate the impact of biomass pre- 
treatments (including acid and alkaline pretreatments, some in 
combination with hydrogen peroxide) on the change of cell wall 
compositions of wheat and oat straw, FT-NIR was utilized to 
characterize raw and pretreated straw (Krongtaew etal, 2010). 
Second derivatives from NIR absorption bands were generated 
and evaluated to show the changes in properties related to biomass 
recalcitrance during subsequent bioethanol production. These 
properties include the change of lignin, hemicelluloses; as well 
as amorphous, semi-crystalline, and crystalline regions of cellu- 
lose moieties of pretreated sample. PCA of derivative data was 
efficiently utilized to differentiate the alterations in chemical struc- 
ture of straw due to different pretreatment methods as shown in 
Figure 1. It was demonstrated that FT-NIR coupled with PCA is a 
powerful tool to assess biomass digestibility, with a potential to be 
used in process control in the area of biomass utilization or energy 
conversion. 

NIR SPECTROSCOPY COUPLED WITH PLS 

One of the main applications of NIR coupled with PLS is to 
build regression models for the prediction of biomass properties, 
such as lignin content, S/G-lignin ratio, moisture content, heating 
value (Kelley et al, 2004a; Rousset et al., 201 1; Schwanninger et al., 
2011). 

Related areas of the application of NIR coupled with PLS in 
existing literatures are summarized below: 

(1) Prediction of cell wall components (Michell, 1995; Sanderson 
et al, 1996; Tucker et al, 2001; Bailleres et al., 2002; Kelley et al, 
2004a; Lovett etal., 2004; Yeh etal, 2004; Jin and Chen, 2007; 
Labbe et al, 2008b; Philip Ye et al, 2008; Wolfrum and Sluiter, 
2009; Nkansah etal, 2010; Hou and Li, 2011; Sandak and 
Sandak, 20 1 1 ; Smith-Moritz et al, 20 1 1 ; Zhou et al, 20 1 1 ) . 

For example, in order to identify specific monosaccharide 
outliers from a plant mutant population, FT-NIR coupled with 
PLS regression was utilized to analyze plant leaves of Arabidop- 
sis (Smith-Moritz etal., 2011). Various Arabidopsis cell wall 
mutants were analyzed for prediction model building. PCA was 
performed on pre-processed and area-normalized NIR spectra, 
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FIGURE 1 | PCA scores plot of untreated wheat straw samples (•) and samples treated with acid (▼), alkali (I 
reproduced from literature (Krongtaew etal., 2010). 



, acid/H 2 0 2 (□), and alkali/H 2 0 2 (A) as 



followed by calculation of the Mahalanobis distance, a linear 
discriminate analysis technique to identify outliers using PCA 
results. By using this technique, a pilot study was conducted 
which consisted of 550 mutant lines (3590 leaf samples), 
resulting in a set of 235 leaf samples as Mahalanobis outliers. 
Quantitative information about monosaccharide composition 
is gained by means of PLS modeling with known biochem- 
ical values and FT-NIR spectra. The correlation between 
predicted and experiment determined monosaccharide com- 
position (mol%) of 226 rice leaf samples are shown in Figure 2 
with _R 2 = 0.98 (Smith-Moritz et al., 201 1). 
(2) Prediction of other physical properties (Thygesen, 1994; 
Hoffmeyer and Pedersen, 1995), mechanical properties (Kelley 
et al, 2004a; Andre et al, 2006), fuel properties (Lestander and 
Rhen, 2005; Labbe et al, 2008a). 

For example, NIR coupled with PLS has been used to pre- 
dict cell wall chemistry and mechanical properties of loblolly 
pine from different radial locations and heights of trees 
grown in Arkansas (Kelley etal., 2004a). Mechanical proper- 
ties include three point bending test and related microfibril 
angle. The correlation between experimental data and pre- 
dicted data from PLS modeling is very strong with correlation 
coefficients (r) as high as 0.80. A reduced spectral range (650- 
1150 nm) usually available in handheld NIR spectrometers 
was also demonstrated to be useful for predicting mechanical 
properties. 



BIOMASS CHARACTERIZATION BY Py-mbms 

Py-mbms has been intensively used for studies of biological 
and synthetic macromolecules, such as wood, grasses, carbon 
in soil and chars. It has proved to be an efficient and power- 
ful analytical tool (Evans and Milne, 1987; Kelley etal, 2002; 
Labbe etal, 2005; Magrini etal, 2007; Sykes etal., 2008; Mann 
etal, 2009; French and Czernik, 2010). Detailed description of 
this technology is available in the above references. In short, 
the Py-mbms is composed of a pyrolysis furnace and a free- 
jet mbms. Typically the furnace is preheated to 500° C before 
ground sample of biomass is inserted into the inert atmo- 
sphere of the furnace. Pyrolysis products from biomass in the 
furnace are swept out of the furnace into the mbms by an 
argon gas stream. Molecular fragments contained in the pyrol- 
ysis vapor are expanded in a series of vacuum chambers to be 
quenched; so that intermolecular collisions are prevented. A low- 
energy electron beam (17-23 eV) in the triple quadruple mass 
spectrometer is employed to produce a positive ion mass spec- 
trum. The positive ion stream is magnified and collected by the 
detector. 

Mass peaks were assigned to chemical fragments produced 
from fast pyrolysis of biomass for direct interpretation (Evans and 
Milne, 1987). The spectra from Py-mbms is also interpreted with 
the help of MVA tools, especially PLS and PCA (Hoover etal, 
2002; Kelley etal, 2002, 2004b; Labbe etal, 2005; Magrini etal., 
2007; Mann etal, 2009). 
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FIGURE 2 | A correlation analysis predicted (PLS model of FT-NIR) 
versus experimentally determined monosaccharide composition 
(mol%) of rice leaf samples. The correlation coefficient between 
experimental and predicted values was calculated to be R 2 = 0.98 as 
reproduced from literature (Agblevor etal., 1994; Smith-Moritz etal., 2011). 



Py-mbms PEAK ASSIGNMENT AND DATA PROCESSING 

During data acquisition of Py-mbms, amplified positive ions from 
biomass pyrolysis vapor are scanned continuously; then the sig- 
nal is collected by a computer. Approximate evolution time of 
fast pyrolysis for a sample of 4 mg is less than 1 min. During 
the evolution time there are typically 50 single scans collected. 
Biomass with larger sample size will need longer evolution time 
and more scans during fast pyrolysis. Together with single scan 
spectrum, time resolved profile and averaged spectrum can be 
collected by the computer acquisition software (Evans and Milne, 
1987). 

Average spectra are also known as spectral "fingerprints." Spec- 
tral fingerprints gained at analytical pyrolysis temperature of 
500-550°C and the molecular beam free jet expansion repre- 
sent primary products from biomass pyrolysis. Studies shown 
that at this temperature range, molecular structure of the orig- 
inal biomass is well preserved and there is no interaction observed 
among organic components during pyrolysis, although inorganics 
may alter the pyrolysis pathways of the carbohydrates (Evans and 
Milne, 1987). Thus, with known peak assignment, spectral "finger- 
prints" generated could be used to depict the molecular structure 
of chemical composition in biomass. A summary of important 
peak assignment in biomass is shown in Table 1 (Evans and Milne, 
1987; Sykes etal., 2008). Characteristic spectral fingerprints of 
whole biomass samples and separated constituents of biomass are 
shown in Figure 3 (Evans and Milne, 1987). 

Pyrolysis-molecular beam mass spectrometry has been suc- 
cessfully applied in many biomass-related studies, including the 
research of cellulose, cellulose with inorganics, many woods, xylan, 
milled wood lignin, bagasse (Evans and Milne, 1987), herbaceous 
biomass under different storage environments (Agblevor etal., 
1994), hardwood sawdust and its torrefaction products (Nimlos 
et al., 2003), and poplar grown under different nitrogen conditions 
(Sykes etal, 2009). 



For example, in the study of bark phenolysis conducted by 
Alma and Kelley, bark and its phenolysis products from Calabrian 
pine, Lebanon cedar, acacia, and European chestnut were charac- 
terized using Py-mbms (Alma and Kelley, 2002). From the results 
of Py-mbms averaged spectra, it was shown that bark ( 1 ) has less 
common lignin peaks at mlz 180, 194, 210 assigned to coniferyl 
alcohol/vinylsyringol, 4-propenylsyringol/ferulic acid, and sinapyl 
alcohol, respectively; (2) has unique triplet of peaks at mlz of 96, 
97, 98 assigned to furans; and (3) has more phenols, such as peaks 
at mlz of 110, 124, 150, and 164 assigned to catechol, guaia- 
col, vinyl guaiacol, and isoeugenol. In softwood bark, extractives 
and lignin dimers can be identified at mlz of 298, 300, 302, and 
272 assigned to didehydroabeitic acid, dehydroabeitic acid, abeit- 
icacid, and lignin dimer, respectively (Alma and Kelley, 2002). 
These results are consistent with known differences between bark 
and wood. 

SELECTED PEAKS FROM Py-mbms RAW DATA 

As summarized above, certain Py-mbms peaks can be unambigu- 
ously assigned to specific biomass components. Lignin fragments 
are particularly easy to identify. Because of this, Klason lignin con- 
tent of biomass can be directly estimated from Py-mbms spectral 
fingerprints. Firstly, spectral fingerprints of samples are area/mean 
normalized for the mass of the original sample. Then, the total 
intensity of lignin related peaks from the normalized spectrum is 
calculated. After that, a correction factor is calculated by dividing 
the known Klason lignin value by the summed intensity of a NIST 
standard material. The correction factor can be used to convert 
the total intensity of lignin related peaks to Klason lignin con- 
tent (Davis and Lagutaris, 2002; Sykes etal, 2008, 2009; Ziebell 
etal, 2013). Similarly, S/G ratios were determined by dividing 
the sum of S-lignin peaks by the sum of G-lignin peaks exclud- 
ing peaks associated with both S and G fragments (Davis and 
Lagutaris, 2002; Sykes et al, 2008, 2009; Mann et al, 2009; Ziebell 
etal, 2013). 

For example, corrected lignin values and S/G-lignin ratio were 
determined from Py-mbms for 800 greenhouse-grown poplar 
trees grown under atmosphere containing different amount of 
nitrogen (Sykes etal., 2009). Lignin contents ranged from 13 to 
28% whereas S/G ranged from 0.5 to 1.5. It was shown that 
the variations in cell wall composition were larger in the plants 
grown under high nitrogen conditions than those grown under 
low nitrogen conditions. 

Similarly, "within-tree" variability in lignin content and S/G 
ratio with increasing height and increasing ring for poplars 
was determined by Py-mbms (Sykes etal, 2008). Wood disks 
from seven different poplar trees, which were seven years old, 
were sampled at five different heights of 0.3, 0.6, 1.2, 1.8, 
and 2.4 m from base to stem. Samples were collected from 
the north side of each wood disk taken at height of 1.2 m to 
study difference between growth rings. According to results from 
Py-mbms, ring effect on lignin content was significant while 
the effect of height was small. Higher S/G ratio was observed 
with increasing ring size, whereas lignin content decreased. 
S/G ratio was determined for switchgrass grown under dif- 
ferent environment using the same methodology (Mann etal., 
2009). 
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Table 1 I Peak assignments associated with Py-mbms spectrum for Populus wood based on literature (Evans and Milne, 1987; Sykes etal., 2008). 



Mass peaks (m/z) 

57 73, 85, 96, 114 

57 60, 73, 98, 126, 144 

94 

108 

110 

120 

122 

124 

137* 

138 

150 

152 

154 

164 

167* 

168 

178 

180 

182 

194 

208 

210 



Assigned products 

From C5 sugar 
From C6 sugar 

Phenol, dimethylcyclopentene 
Methyl phenol (o-cresol, m/p-cresol) 
Dihydroxybenzene, 5-methylfurfural 
Vinylphenol 

Ethylphenol, ethylphenol, benzoic acid 

Guaiacol (2-methoxyphenol), trimethylcyclopentenone 

Ethylguaiacol, homovanillin, coniferyl alcohol 

Methylguaiacol 

p-lnylguaiacol, coumaryl alcohol 
4-Ethylguaiacol, vanillin 
Syringol (2,6-dimethoxyphenol) 
Isoeugenenol, eugenol 

Ethylsyringol, syrinylacetone, propiosyringone 

4-Methyl-2,6-dimethoxyphenol 

Coniferyl aldehyde 

Coniferyl alcohol, syringylethene 

Syringaldehyde 

4-Propenylsyringol 

Synapyl aldehyde 

Synapyl alcohol 



S or G precursor 



G 
G 
G 
G 
G 
S 
G 
S 

s 

G 

S, G 
S 

s 
s 
s 



*Fragmention. 

m/z: mass to charge ratio. 

S, syringyl lignin; G, guaiacol lignin. 



Py-mbms COUPLED WITH PCA 

Pyrolysis-molecular beam mass spectrometry coupled with PCA 
provides a fast analytical method to distinguish a large number of 
biomass samples. It has been used to study biomass compositional 
variations due to species (Evans and Milne, 1987; Agblevor etal., 
1994; Alma and Kelley, 2002; Kelley etal., 2004b), genetic engi- 
neering (Labbe etal, 2005; Davis etal., 2006), different growth 
environments (Mann etal., 2009; Sykes etal, 2009), thermal 
(Nimlos etal, 2003)/chemical (Alma and Kelley, 2002; Kelley 
etal, 2004b)/biological (Kelley etal, 2002; Arantes etal, 2009) 
treatments, and various storage/collection (Agblevor etal, 1994) 
methods. 

For example, Py-mbms coupled with PCA has been used to 
measure the overall composition between and within a series of 
original and transgenic aspens (Labbe etal, 2005). Two clones 
were transformed with GRP-waM gene (Nl-17-26 and Nl-2-1) 
and GRP-iflflM/35S-ACCase (N2-4-9 and N2-5-5). PCA analysis 
was conducted for data analysis with an attempt to identify chem- 
ical differences between the modified and control aspens. Figure 4 
shows PCA scores plots with four replicate samples from five dif- 
ferent aspen samples. Figure 4A shows a plot of PCI versus PC2, 
while Figure 4B shows a plot of PC2 versus PC3. In Figure 4A, 
there is clear separation between the two Nl samples while two 



N2 samples are indistinguishable. Moreover, two N2 samples are 
clearly separated from each other along PC3 as shown in Figure 4B. 
The loadings from PCA are shown in Figure 5. Using PCI load- 
ings as an example, C5 carbohydrates (m/z 85 and 114) and lignin 
(m/z 137, 180, 210, and 272) are highlighted for PCI. This suggests 
there are more C5 sugars and less lignin in controls than those in 
Nl and N2 samples (Labbe et al, 2005). 

Pyrolysis-molecular beam mass spectrometry had been also 
used to study the impact of storage environment on herba- 
ceous material. Weathered and unweathered fractions of three 
types of herbaceous biomass after storage at 18 different con- 
ditions for 6-9 months were analyzed by Py-mbms coupled 
with PCA (Agblevor etal, 1994). Two major trends in the data 
were shown by PCA (factor analysis): major clusters were dis- 
tinguished by relative nitrogen contents between switchgrass and 
the other two herbaceous biomass samples; subgroups of weath- 
ered and unweathered materials are clearly separated as subgroups 
within the major clusters. According to the variance diagram 
(similar to loadings plot), lower amount of carbohydrates con- 
stituted the major chemical difference between weathered and 
unweathered samples (Agblevor etal., 1994). This observation is 
consistent with results from traditional wet chemical analysis and 
Py-GC/MS. 
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FIGURE 3 | Characteristic mass spectral patterns of primary pyrolysis products for several whole biomass samples and for separated constituents of 
biomass (Evans and Milne, 1987). 
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FIGURE 4 | Scores plot of PCA of Py-mbms data for original and transgenic aspens; (A) PC1 versus PC2; (B) PC2 versus PC3; N1 samples are clearly 
separated from control samples in (A) while two N2 samples are not distinguishable; in (B) two N2 samples are clearly separated by PC3 as 
reproduced from literature (Labbe etal., 2005). 



In some cases, there is no separation of clusters in PCA scores 
plot. This indicates that there is no comprehensive difference 
among samples for the specific chemical features included in those 
particular PCs. 

For example, three transgenic clones of populous wood were 
analyzed by Py-mbms, GC/MS, and traditional wet chemical tech- 
niques to screen for possible variations in cell wall composition 
due to genetic engineering (Davis etal., 2006). Various Bacillus 
thuringiensis (Bt) gene-containing constructs were used to trans- 
form poplar genotypes. Transgenic poplar was then compared 



with non-transgenic control. PCA results showed that there were 
generally no distinct groupings of individual transgenic lines or 
non-transgenic controls, indicating no significant differences in 
cell wall composition between control and transgenic poplars 
(Davis etal, 2006). 

Py-mbms COUPLED WITH PLS 

One of the primary applications of Py-mbms has been the develop- 
ment of prediction models for biomass compositional properties. 
Results from conventional methods of cell wall compositional 
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FIGURE 5 | Loadings from PCA of Py-mbms data for original and 
transgenic aspens; from top to bottom: PC3, PC2, PCI; C5 
carbohydrates (m/z 85 and 114) and lignin (m/z 137, 180, 210, and 272) 
are highlighted for PCI as reproduced from literature (Labbe etal., 
2005). 



analysis were used as references to build calibration models with 
capability for predicting the composition for future samples. As a 
result, laborious wet chemistry techniques can be eliminated. PLS 
regression is widely used in this arena for both woody (Tuskan 
et al., 1999; Labbe et al., 2005) and herbaceous biomass (Agblevor 
et al, 1994; Kelley et al, 2004b; Mann et al., 2009). 

For example, the effectiveness of NIR and Py-mbms in pre- 
dicting cell wall composition of various agricultural residues was 
tested (Kelley etal., 2004b). Forty-one samples from 14 species 
with known content of lignin and six individual sugars were ana- 
lyzed by NIR and Py-mbms. Prediction models were built between 
spectral data from both techniques and cell wall compositional 
data. Correlation coefficient and root mean square error data for 
each calibration and validation model was presented and com- 
pared. Good correlations between the predicted and measured 
value of major components (lignin, glucose, xylose, and mannose) 
were obtained (correlation coefficients of both calibration and val- 
idation model are above 0.80 for both NIR and Py-mbms), while 
correlations for minor sugars (mannose, galactose, arabinose, and 
rhamnose) were not as good. A summary of PLS prediction of 
chemical composition from Py-mbms is presented in Table 2. 
According to the author, more samples for specific feedstocks 
are needed for building improved models. This work also did a 
thorough comparison between NIR and Py-mbms (Kelley etal., 
2004b). 



Other than being used to predict cell wall composition of 
biomass, PLS has been applied in predicting other biomass prop- 
erties and processing parameters. The acidic phenolysis condition 
of bark (Alma and Kelley, 2002), weight loss during fungal degra- 
dation of spruce (Kelley et al., 2002) and carbon content/fraction 
of different soils (Hoover etal., 2002; Magrini etal., 2007) were 
also predicted by Py-mbms coupled with PLS. 

For example, NIR and Py-mbms were utilized to monitor the 
chemical changes of wood undergoing brown-rot degradation. In 
this case, spruce blocks were infected by Postia placenta or Glaoeo- 
phyllum trabeum for 0, 2, 4, 8, and 16 weeks (Kelley etal., 2002). 
Weight losses over the time period were monitored and recorded. 
PLS models were built to predict weight loss. Strong correlation 
between recorded weight loss and predicted weight were obtained 
(correlation coefficients of calibration model reached 0.98, while 
those of test model reached 0.96 for both NIR and Py-mbms). 
The regression coefficients for PLS model from Py-mbms data 
show that weight loss during decay is positively correlated to 
carbohydrates (m/z 85, 114, and 126) and negatively correlated 
to monomethoxylated lignin fragments (m/z 123, 138, and 151; 
Kelley etal., 2002). 

CONCLUSION 

Compared to traditional techniques in biomass characterization, 
high-throughput analytical techniques, such as NIR and Py-mbms 
have been proved to be efficient tools in exploring the chemical 
features of different biomass samples with minimal sample prepa- 
ration. These high-throughput techniques coupled with MVA have 
been demonstrated to be efficient in identifying outliers, compar- 
ing samples (using PCA), and building prediction models (using 
PLS). Both NIR and Py-mbms coupled with MVA could be used 
not only for characterizing the cell wall chemistry, but also for pre- 
dicting other chemical, physical, mechanical, and fuel properties. 
In comparison with Py-mbms, NIR has the advantages of low cost 
and simple instrumentation, field-portable, and nondestructive, 
whereas Py-mbms provides superior information of molecular 
structural information. 

Thus, we recommend that NIR and Py-mbms coupled with 
MVA should be widely employed for biomass characterization. 
Additional fundamental work on assigning NIR vibrations band 
and Py-mbm speaks for modified biomass or biomass related 
products are recommended since current assignment are mainly 
based on the study of unmodified biomass. Lack of assignments 
for new bands/peaks in modified biomass limit the application 
of these two techniques in exploring the fundamental changes 
of chemical composition of modified biomass. Also, comparison 
and correlation between analytical results from Py-GC/MS and 



Table 2 | Summary of the PLS-2 predictions of chemical composition from Py-mbms (six PCs; Kelley etal., 2004b). 



Lignin Glucose Xylose Mannose Galactose Arabinose Rhamnose 

r(CALB) 0.85 0.85 0.87 0.92 0.83 0.70 0.80 

r(VALD) 0.77 0.75 0.81 0.86 0.65 0.54 0.71 

RMSEC 4.60 6.20 3.40 1.40 0.40 0.50 0.10 

RMSEP 5.50 8.00 4.10 1.80 0.50 0.60 0.10 
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Py-mbms should be encouraged because of the important simi- 
larity and differences in these two techniques are critical for using 
those techniques for the characterization of biomass molecular 
structure. 
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